Complete all questions below. After completing the assignment, knit your document, and download both your .Rmd and knitted output. Upload your files for peer review.
For each response, include comments detailing your response and what each line does.
library(nycflights13)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Using the nycflights13 dataset, find all flights that departed in July, August, or September using the helper function between().
flights_jul_aug_sep <- flights %>%
filter(between(month, 7, 9))
flights_jul_aug_sep
Using the nycflights13 dataset sort flights to find the 10 flights that flew the furthest. Put them in order of fastest to slowest.
longest_flights <- flights %>%
arrange(desc(distance)) %>%
head(10) %>%
mutate(speed = distance / (air_time / 60)) %>% # calculate speed in miles per hour
arrange(desc(speed))
longest_flights
Using the nycflights13 dataset, calculate a new variable called “hr_delay” and arrange the flights dataset in order of the arrival delays in hours (longest delays at the top). Put the new variable you created just before the departure time.Hint: use the experimental argument .before.
flights_with_hr_delay <- flights %>%
mutate(hr_delay = arr_delay / 60) %>%
arrange(desc(hr_delay)) %>%
select(year:day, hr_delay, dep_time, everything())
flights_with_hr_delay
Using the nycflights13 dataset, find the most popular destinations (those with more than 2000 flights) and show the destination, the date info, the carrier. Then show just the number of flights for each popular destination.
popular_destinations <- flights %>%
group_by(dest) %>%
filter(n() > 2000) %>%
select(dest, year, month, day, carrier)
number_of_flights <- popular_destinations %>%
group_by(dest) %>%
summarise(num_flights = n())
list(popular_destinations, number_of_flights)
## [[1]]
## # A tibble: 302,969 × 5
## # Groups: dest [46]
## dest year month day carrier
## <chr> <int> <int> <int> <chr>
## 1 IAH 2013 1 1 UA
## 2 IAH 2013 1 1 UA
## 3 MIA 2013 1 1 AA
## 4 ATL 2013 1 1 DL
## 5 ORD 2013 1 1 UA
## 6 FLL 2013 1 1 B6
## 7 IAD 2013 1 1 EV
## 8 MCO 2013 1 1 B6
## 9 ORD 2013 1 1 AA
## 10 PBI 2013 1 1 B6
## # ℹ 302,959 more rows
##
## [[2]]
## # A tibble: 46 × 2
## dest num_flights
## <chr> <int>
## 1 ATL 17215
## 2 AUS 2439
## 3 BNA 6333
## 4 BOS 15508
## 5 BTV 2589
## 6 BUF 4681
## 7 CHS 2884
## 8 CLE 4573
## 9 CLT 14064
## 10 CMH 3524
## # ℹ 36 more rows
Using the nycflights13 dataset, find the flight information (flight number, origin, destination, carrier, number of flights in the year, and percent late) for the flight numbers with the highest percentage of arrival delays. Only include the flight numbers that have over 100 flights in the year.
flight_info <- flights %>%
group_by(flight) %>%
filter(n() > 100) %>%
summarise(
origin = first(origin),
dest = first(dest),
carrier = first(carrier),
num_flights = n(),
percent_late = mean(arr_delay > 0) * 100
) %>%
arrange(desc(percent_late))
flight_info